首页> 外文OA文献 >If it ain't broke, don't fix it: Sparse metric repair
【2h】

If it ain't broke, don't fix it: Sparse metric repair

机译:如果没有损坏,请不要修复它:稀疏度量修复

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Many modern data-intensive computational problems either require, or benefitfrom distance or similarity data that adhere to a metric. The algorithms runfaster or have better performance guarantees. Unfortunately, in realapplications, the data are messy and values are noisy. The distances betweenthe data points are far from satisfying a metric. Indeed, there are a number ofdifferent algorithms for finding the closest set of distances to the given onesthat also satisfy a metric (sometimes with the extra condition of beingEuclidean). These algorithms can have unintended consequences, they can changea large number of the original data points, and alter many other features ofthe data. The goal of sparse metric repair is to make as few changes as possible to theoriginal data set or underlying distances so as to ensure the resultingdistances satisfy the properties of a metric. In other words, we seek tominimize the sparsity (or the $\ell_0$ "norm") of the changes we make to thedistances subject to the new distances satisfying a metric. We give threedifferent combinatorial algorithms to repair a metric sparsely. In one settingthe algorithm is guaranteed to return the sparsest solution and in the othersettings, the algorithms repair the metric. Without prior information, thealgorithms run in time proportional to the cube of the number of input datapoints and, with prior information we can reduce the running time considerably.
机译:许多现代数据密集型计算问题要么需要遵守度量标准的距离或相似性数据,要么从中受益。该算法运行速度更快或具有更好的性能保证。不幸的是,在实际应用中,数据混乱并且值嘈杂。数据点之间的距离远不能满足度量标准。确实,有很多不同的算法可以找到与给定距离最接近的一组距离,这些距离也可以满足一个度量标准(有时存在beucle欧式的额外条件)。这些算法可能会产生意想不到的后果,它们可能会更改大量原始数据点,并更改数据的许多其他特征。稀疏度量标准修复的目的是对原始数据集或基础距离进行尽可能少的更改,以确保结果距离满足度量标准的属性。换句话说,我们力求最小化对距离的更改的稀疏性(或$ \ ell_0 $“范数”),该距离取决于满足度量标准的新距离。我们给出了三种不同的组合算法来稀疏地修复度量。在一个设置中,保证算法返回最稀疏的解,而在其他设置中,算法修复度量。没有先验信息,算法将按与输入数据点数量的立方成正比的时间运行,有了先验信息,我们可以大大减少运行时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号